智能论文笔记

Ensemble learning techniques for intrusion detection system in the context of cybersecurity

Andricson Abeline Moreira , Carlos A. C. Tojeiro , Carlos J. Reis , Gustavo Henrique Massaro , Igor Andrade Brito e Kelton A. P. da Costa

分类：机器学习

2022-12-21

Recently, there has been an interest in improving the resources available in Intrusion Detection System (IDS) techniques. In this sense, several studies related to cybersecurity show that the environment invasions and information kidnapping are increasingly recurrent and complex. The criticality of the business involving operations in an environment using computing resources does not allow the vulnerability of the information. Cybersecurity has taken on a dimension within the universe of indispensable technology in corporations, and the prevention of risks of invasions into the environment is dealt with daily by Security teams. Thus, the main objective of the study was to investigate the Ensemble Learning technique using the Stacking method, supported by the Support Vector Machine (SVM) and k-Nearest Neighbour (kNN) algorithms aiming at an optimization of the results for DDoS attack detection. For this, the Intrusion Detection System concept was used with the application of the Data Mining and Machine Learning Orange tool to obtain better results

translated by 谷歌翻译

A survey on text generation using generative adversarial networks

Gustavo Henrique de Rosa , João Paulo Papa

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-20

This work presents a thorough review concerning recent studies and text generation advancements using Generative Adversarial Networks. The usage of adversarial learning for text generation is promising as it provides alternatives to generate the so-called "natural" language. Nevertheless, adversarial text generation is not a simple task as its foremost architecture, the Generative Adversarial Networks, were designed to cope with continuous information (image) instead of discrete data (text). Thus, most works are based on three possible options, i.e., Gumbel-Softmax differentiation, Reinforcement Learning, and modified training objectives. All alternatives are reviewed in this survey as they present the most recent approaches for generating text using adversarial-based techniques. The selected works were taken from renowned databases, such as Science Direct, IEEEXplore, Springer, Association for Computing Machinery, and arXiv, whereas each selected work has been critically analyzed and assessed to present its objective, methodology, and experimental results.

translated by 谷歌翻译

Asymmetric Co-teaching with Multi-view Consensus for Noisy Label Learning

Fengbei Liu , Yuanhong Chen , Chong Wang , Yu Tain , Gustavo Carneiro

分类：计算机视觉

2023-01-01

Learning with noisy-labels has become an important research topic in computer vision where state-of-the-art (SOTA) methods explore: 1) prediction disagreement with co-teaching strategy that updates two models when they disagree on the prediction of training samples; and 2) sample selection to divide the training set into clean and noisy sets based on small training loss. However, the quick convergence of co-teaching models to select the same clean subsets combined with relatively fast overfitting of noisy labels may induce the wrong selection of noisy label samples as clean, leading to an inevitable confirmation bias that damages accuracy. In this paper, we introduce our noisy-label learning approach, called Asymmetric Co-teaching (AsyCo), which introduces novel prediction disagreement that produces more consistent divergent results of the co-teaching models, and a new sample selection approach that does not require small-loss assumption to enable a better robustness to confirmation bias than previous methods. More specifically, the new prediction disagreement is achieved with the use of different training strategies, where one model is trained with multi-class learning and the other with multi-label learning. Also, the new sample selection is based on multi-view consensus, which uses the label views from training labels and model predictions to divide the training set into clean and noisy for training the multi-class model and to re-label the training samples with multiple top-ranked labels for training the multi-label model. Extensive experiments on synthetic and real-world noisy-label datasets show that AsyCo improves over current SOTA methods.

translated by 谷歌翻译

Improving Pre-Trained Weights Through Meta-Heuristics Fine-Tuning

Gustavo H. de Rosa , Mateus Roder , João Paulo Papa , Claudio F. G. dos Santos

分类：人工智能

2022-12-19

Machine Learning algorithms have been extensively researched throughout the last decade, leading to unprecedented advances in a broad range of applications, such as image classification and reconstruction, object recognition, and text categorization. Nonetheless, most Machine Learning algorithms are trained via derivative-based optimizers, such as the Stochastic Gradient Descent, leading to possible local optimum entrapments and inhibiting them from achieving proper performances. A bio-inspired alternative to traditional optimization techniques, denoted as meta-heuristic, has received significant attention due to its simplicity and ability to avoid local optimums imprisonment. In this work, we propose to use meta-heuristic techniques to fine-tune pre-trained weights, exploring additional regions of the search space, and improving their effectiveness. The experimental evaluation comprises two classification tasks (image and text) and is assessed under four literature datasets. Experimental results show nature-inspired algorithms' capacity in exploring the neighborhood of pre-trained weights, achieving superior results than their counterpart pre-trained architectures. Additionally, a thorough analysis of distinct architectures, such as Multi-Layer Perceptron and Recurrent Neural Networks, attempts to visualize and provide more precise insights into the most critical weights to be fine-tuned in the learning process.

translated by 谷歌翻译

Benchmarking AutoML algorithms on a collection of binary problems

Pedro Henrique Ribeiro , Patryk Orzechowski , Joost Wagenaar , Jason H. Moore

分类：机器学习

2022-12-06

Automated machine learning (AutoML) algorithms have grown in popularity due to their high performance and flexibility to adapt to different problems and data sets. With the increasing number of AutoML algorithms, deciding which would best suit a given problem becomes increasingly more work. Therefore, it is essential to use complex and challenging benchmarks which would be able to differentiate the AutoML algorithms from each other. This paper compares the performance of four different AutoML algorithms: Tree-based Pipeline Optimization Tool (TPOT), Auto-Sklearn, Auto-Sklearn 2, and H2O AutoML. We use the Diverse and Generative ML benchmark (DIGEN), a diverse set of synthetic datasets derived from generative functions designed to highlight the strengths and weaknesses of the performance of common machine learning algorithms. We confirm that AutoML can identify pipelines that perform well on all included datasets. Most AutoML algorithms performed similarly without much room for improvement; however, some were more consistent than others at finding high-performing solutions for some datasets.

translated by 谷歌翻译

Characterizing instance hardness in classification and regression problems

Gustavo P. Torquette , Victor S. Nunes , Pedro Y. A. Paiva , Lourenço B. C. Neto , Ana C. Lorena

分类：机器学习

2022-12-04

Some recent pieces of work in the Machine Learning (ML) literature have demonstrated the usefulness of assessing which observations are hardest to have their label predicted accurately. By identifying such instances, one may inspect whether they have any quality issues that should be addressed. Learning strategies based on the difficulty level of the observations can also be devised. This paper presents a set of meta-features that aim at characterizing which instances of a dataset are hardest to have their label predicted accurately and why they are so, aka instance hardness measures. Both classification and regression problems are considered. Synthetic datasets with different levels of complexity are built and analyzed. A Python package containing all implementations is also provided.

translated by 谷歌翻译

From Actions to Events: A Transfer Learning Approach Using Improved Deep Belief Networks

Mateus Roder , Jurandy Almeida , Gustavo H. de Rosa , Leandro A. Passos , André L. D. Rossi , João P. Papa

分类：计算机视觉 | 人工智能

2022-11-30

In the last decade, exponential data growth supplied machine learning-based algorithms' capacity and enabled their usage in daily-life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks regarding the learning process as training complex models over large datasets are expensive and time-consuming. Such a problem is even more evident when dealing with video analysis. Some works have considered transfer learning or domain adaptation, i.e., approaches that map the knowledge from one domain to another, to ease the training burden, yet most of them operate over individual or small blocks of frames. This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model, denoted as Spectral Deep Belief Network. Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process. The experimental results conducted over two public video dataset, the HMDB-51 and the UCF-101, depict the effectiveness of the proposed model and its reduced computational burden when compared to traditional energy-based models, such as Restricted Boltzmann Machines and Deep Belief Networks.

translated by 谷歌翻译

Editable indoor lighting estimation

Henrique Weber , Mathieu Garon , Jean-François Lalonde

分类：计算机视觉

2022-11-08

We present a method for estimating lighting from a single perspective image of an indoor scene. Previous methods for predicting indoor illumination usually focus on either simple, parametric lighting that lack realism, or on richer representations that are difficult or even impossible to understand or modify after prediction. We propose a pipeline that estimates a parametric light that is easy to edit and allows renderings with strong shadows, alongside with a non-parametric texture with high-frequency information necessary for realistic rendering of specular objects. Once estimated, the predictions obtained with our model are interpretable and can easily be modified by an artist/user with a few mouse clicks. Quantitative and qualitative results show that our approach makes indoor lighting estimation easier to handle by a casual user, while still producing competitive results.

translated by 谷歌翻译

On the Generalization of Deep Reinforcement Learning Methods in the Problem of Local Navigation

Victor R. F. Miranda , Armando A. Neto , Gustavo M. Freitas , Leonardo A. Mozelli

分类：机器人 | 机器学习

2022-09-28

在本文中，我们研究了DRL算法在本地导航问题的应用，其中机器人仅配备有限量距离的外部感受传感器（例如LIDAR），在未知和混乱的工作区中朝着目标位置移动。基于DRL的碰撞避免政策具有一些优势，但是一旦他们学习合适的动作的能力仅限于传感器范围，它们就非常容易受到本地最小值的影响。由于大多数机器人在非结构化环境中执行任务，因此寻求能够避免本地最小值的广义本地导航政策，尤其是在未经训练的情况下，这是非常兴趣的。为此，我们提出了一种新颖的奖励功能，该功能结合了在训练阶段获得的地图信息，从而提高了代理商故意最佳行动方案的能力。另外，我们使用SAC算法来训练我们的ANN，这表明在最先进的文献中比其他人更有效。一组SIM到SIM和SIM到现实的实验表明，我们提出的奖励与SAC相结合的表现优于比较局部最小值和避免碰撞的方法。

translated by 谷歌翻译

A Machine Learning Approach for DeepFake Detection

Gustavo Cunha Lacerda , Raimundo Claudio da Silva Vasconcelos

分类：计算机视觉

2022-09-28

随着深层技术的传播，这项技术变得非常易于访问和足够好，以至于对其恶意使用感到担忧。面对这个问题，检测锻造面孔对于确保安全和避免在全球和私人规模上避免社会政治问题至关重要。本文提出了一种使用卷积神经网络检测深击的解决方案，并为此目的开发了一个数据集-celeb -df。结果表明，在这些图像的分类中，总体准确性为95％，提出的模型接近于最新的现状，并且可以调整未来出现的操纵技术的可能性。。

translated by 谷歌翻译